Wideband Speech Recovery from Narrowband Speech Using Classified Codebook Mapping

نویسندگان

  • Yasheng Qian
  • Peter Kabal
چکیده

Speech sounds occupy 8 kHz or more of bandwidth. However, current public telephone networks limit the speech bandwidth to 300–3400 Hz. Telephone speech is characterized by thin and muffled sounds, and degraded speaker identification. We describe an algorithm which generates the missing highband components from the narrowband speech signal. The algorithm is based on three acoustic-phonetic classified narrowband-to-wideband linear prediction (LP) spectrum mapping codebooks to recover the missing highband spectrum. Subjective tests show that the reconstructed wideband speech improves the speech quality. LP spectrum bandwidth expansion is used to avoid sharp spectral peaks. The mean SD (log-spectrum distortion) decreases by 0.93 dB, comparing to non-classified codebooks without LP bandwidth expansion. INTRODUCTION Current public telephone networks provide voice services for narrowband speech (300–3400 Hz). Since the human voice occupies 8 kHz or more in bandwidth, telephone speech is characterized by thin and muffled sounds. That results in a degradation of the intelligibility of speech. For example, it is very difficult to distinguish between phonemes, /s/ and /f/, because their highband components (over 3.4 kHz) are important discriminators. Most other phonemes have lost, to some extent, their fidelity in telephony speech. In the coming 3G wireless communications systems with the advanced speech coding technology, Adaptive Multi-Rate codec (AMR) will be employed to provide wideband speech coding. However, the connection between the existing PSTN and 3G wireless systems deteriorates the quality of the service because of the bandwidth bottleneck of PSTN. A promising solution is to use a wideband recovery technique to improve the service quality of PSTN when the signal passes to a wideband network. Several attempts have been made to address the wideband recovery issue. These methods generate an excitation signal which passes through a synthesis filter. There are two main issues for the reconstruction of the missing highband components: the reconstruction of the highband excitation and the highband spectral envelope (via the synthesis filter). One approach is to use a mapping between a codebook of narrowband spectra and a codebook of wideband spectra [Epps, 1998], [Jax, 2002]. However, since different voice sounds demonstrate a large divergence of their spectrum envelope characteristics between the narrowband and wideband speech, a single mapping codebook can result in large spectrum mapping errors. Our new approach for highband spectrum recovery is based on three acoustic-phonetic classified narrowband to highband spectrum mapping codebooks. The missing highband excitation can be regenerated by a bandpass envelope modulated Gaussian white noise with a mapping modulation gain [Qian & Kabal, 2002]. We first introduce the acoustic-phonetic classification of speech sounds. Then, we describe the wideband recovery system with three classified lowband to highband spectral envelope mapping codebooks and three excitation gain mapping codebooks. The wideband reconstruction simulation results of objective measurements for mean spectrum distortion (SD) and spectrograms are given in the last section. ACOUSTIC PHONETICS FOR CLASSIFICATION Acoustic-phonetics describes distinctive waveform, spectrum and power properties of speech sounds, or phonemes. We have considered two parameters, the high band (4 kHz to 8 kHz) vs. lowband (below 4 kHz) energy ratio, r g and the voicing periodicity, or the pitch predictor gain β , to classify phonemes. The r g indicates, the feature of the energy of the missing highband. The parameter β is a measure of the degree of the waveform periodicity or the harmonic structure. It also correlates with Qian et al. Wideband Speech Recovery Proceedings of the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002.  Australian Speech Science & Technology Association Inc. Accepted after abstract review page 107 the highband spectrum envelope. For our application, the 42 phonemes of American English can be classified into 3 groups using the two parameters. • Five unvoiced fricative phonemes: /s/, /f/, /θ /, /sh/, / ∫ /, the whisper, /h/ and 2 affricatives, /j/, /t ∫ /. They have large highband to lowband energy ratios and no harmonics in spectrum (small pitch gain). The parameter r g varies between 9.0 and 25.0 dB. The parameter β is in the range 0.1–0.25. A typical spectral envelope of the phoneme /s/ is shown in Fig. 1. • Four voiced fricatives and 6 stop explosive consonant phonemes: /v/, /th/, /z/, /zh/ and /p/, /t/, /k/, /b/, /d/, /g/. Their highband to low-band energy ratio is moderate, in the range of -13.0–9.0 dB. They have, to certain degree, periodicity in the waveform. The parameter β is in the range 0.25–0.90. The voiced phonemes have β close to the upper boundary. The spectrum envelope of the phoneme /k/ is depicted in Fig. 2. • The other 11 vowels, 6 diphthongs, 4 semivowels and 3 nasal consonant phonemes manifest a small lowband to highband energy ratio ( r g < –15 dB) and very good periodicity in the waveform (pitch gain 0.9 β > ). The phoneme /o/ spectrum envelope is illustrated in Fig. 3. The voicing degree parameter β plays an important role in the classification. We have used the voicing periodicity parameter to approximately classify the speech frames into three groups: 0.25 unvoiced phonemes 0.25 0.9 mixed phonemes 0.9 voiced phonemes β β β ≤ < < ≤ HIGHBAND RESTORATION The key part of a wideband reconstruction system is the highband spectrum envelope reproduction, as depicted in Fig. 4.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pseudo-wideband Speech Reconstruction from Telephone Speech

The bandwidth of telephone speech is limited to a 300 – 3400 Hz bandwidth. The sound quality is much lower than for broadcast radio and audio compact discs. We present an algorithm to regenerate the missing highband components (3.4–7 kHz). The highband spectrum recovery is based on a Line Spectrum Frequency (LSF) VQ codebook mapping from the narrowband speech to the high frequency components. T...

متن کامل

Speech enhancement using STC-based bandwidth extension

Telephone speech is typically bandlimited to 4 kHz, resulting in a ‘muffled’ quality. Coding speech with bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlation between the 0-4 kHz and 4-8 kHz speech bands to resynthesize wideband speech from narrowband spee...

متن کامل

Wideband re-synthesis of narrowband CELP-coded speech using multiband excitation model

In this paper, a method for improving the quality of narrowband CELP-coded speech is present. The approach is to reduce the hoarse voice in CELP-coded speech by enhancing the pitch periodicity in the reproduction signal and also to reduce the muffing characteristics of narrowband speech by regenerating the highband components of speech spectra from the reproduction signal. In the proposed metho...

متن کامل

Generation of broadband speech from narrowband speech using piecewise linear mapping

This paper proposes a recovery method of broadband speech form narrowband speech based on piecewise linear mapping. In this method, narrowband spectrum envelope of input speech is transformed to broadband spectrum envelope using linearly transformed matrices which are associated with several spectrum spaces. These matrices were estimated by speech training data, so as to minimize the mean squar...

متن کامل

Speech bandwidth extension by improved codebook mapping towards increased phonetic classification

Bandwidth limitation (0-4KHz) is a major degradation for the performance of the current speech communication systems. The narrowband speech provides much lower quality and intelligibility than wideband speech (0-8KHz). Speech bandwidth extension technology has been recently investigated to aim at artificially regenerating the missing high-band speech signal. This paper describes a robust speech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002